Thresholding Strategies for Text Classifiers: TREC 2005 Biomedical Triage Task Experiments

نویسندگان

  • Luo Si
  • Tapas Kanungo
چکیده

We participated in the triage task of biomedical documents in the TREC genomic track. In this paper we describe the methods we developed for the four triagesubtasks. Logistic regression and support vector machine algorithms were first trained to generate ranked lists of test documents. Then a subset of the test documents was identified as positive instances by selecting the top-k documents of the ranked lists. Deciding on the ideal value for k requires a good thresholding strategy. In this paper we first describe two thresholding strategies based on i) logistic regression and ii) support vector machines. In addition to these methods, we describe a thresholding method that combines the outputs from logistic regression and support vector machine by applying a joint thresholding strategy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MeSH Based Feedback, Concept Recognition and Stacked Classification for Curation Tasks

This paper reports about experiments carried out in the context of the genomics track at TREC 2004. Experiments were concentrated on two subtasks: the ad hoc retrieval task and the triage task. Experiments for the ad hoc task aimed at improving a standard full-text ad-hoc run (using a language modeling approach) by exploiting the manual classification of MEDLINE abstracts (the MeSH terms) for r...

متن کامل

Biomedical Document Triage Based on Figure Classification

The annotation task in model organism databases is to assign attributes, such as Gene Ontology (GO) codes, to biological entities, such as genes and proteins based on the evidence found in documents or other resources. Document triage precedes an annotation task; it identifies relevant documents that can support the annotation process. Annotation in organism databases involves manual efforts of...

متن کامل

Integrating image data into biomedical text categorization

Categorization of biomedical articles is a central task for supporting various curation efforts. It can also form the basis for effective biomedical text mining. Automatic text classification in the biomedical domain is thus an active research area. Contests organized by the KDD Cup (2002) and the TREC Genomics track (since 2003) defined several annotation tasks that involved document classific...

متن کامل

DIMACS at the TREC 2005 Genomics Track

This report describes DIMACS work on the text categorization task of the TREC 2005 Genomics track. Our approach to this task was similar to the triage subtask studied in the TREC 2004 Genomics track. We applied Bayesian logistic regression and achieved good effectiveness on all categories. 1. TEXT CATEGORIZATION TASK The Mouse Genome Informatics (MGI) project of the Jackson Laboratory provides ...

متن کامل

The TREC 2004 genomics track categorization task: classifying full text biomedical documents

BACKGROUND The TREC 2004 Genomics Track focused on applying information retrieval and text mining techniques to improve the use of genomic information in biomedicine. The Genomics Track consisted of two main tasks, ad hoc retrieval and document categorization. In this paper, we describe the categorization task, which focused on the classification of full-text documents, simulating the task of c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005